A cost–based analysis for risk–averse explore–then–commit finite–time bandits

نویسندگان

چکیده

In this article, a multi–armed bandit problem is studied in an explore–then–commit setting where the cost of pulling arm experimentation (exploration) phase may not be negligible. Identifying best after pure to exploit it once or for given finite number times goal problem. Applications are prevalent personalized health-care and financial investments frequency exploitation limited. setting, we observe that with highest expected reward necessarily most desirable objective exploitation. Alternatively, advocate idea risk aversion, compete against risk–return trade–off. Additionally, trade–off between regret should considered case arms exploration incurs cost. considered, propose class hyper–parameter–free risk–averse algorithms, called OTE/FTE–MAB (One/Finite–Time Exploitation Multi–Armed Bandit), whose objectives select probable single finite–time exploitations. To analyze these define new notion our interest. We provide upper bound order ln (1∈r) minimum experiments done guarantee er regret. As compared existing algorithms do rely on hyper–parameters, resulting more robust behavior practice. has cost, c–OTE–MAB algorithm two–armed bandits addresses cost–regret trade–off, corresponding exploration–exploitation by minimizing linear combination cost– function, using hyper–parameter. This determines estimation optimal explorations value approaches function at rate 1ne associated confidence level, ne each arm.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FPL Analysis for Adaptive Bandits

A main problem of “Follow the Perturbed Leader” strategies for online decision problems is that regret bounds are typically proven against oblivious adversary. In partial observation cases, it was not clear how to obtain performance guarantees against adaptive adversary, without worsening the bounds. We propose a conceptually simple argument to resolve this problem. Using this, a regret bound o...

متن کامل

a time-series analysis of the demand for life insurance in iran

با توجه به تجزیه و تحلیل داده ها ما دریافتیم که سطح درامد و تعداد نمایندگیها باتقاضای بیمه عمر رابطه مستقیم دارند و نرخ بهره و بار تکفل با تقاضای بیمه عمر رابطه عکس دارند

analysis of ruin probability for insurance companies using markov chain

در این پایان نامه نشان داده ایم که چگونه می توان مدل ریسک بیمه ای اسپیرر اندرسون را به کمک زنجیره های مارکوف تعریف کرد. سپس به کمک روش های آنالیز ماتریسی احتمال برشکستگی ، میزان مازاد در هنگام برشکستگی و میزان کسری بودجه در زمان وقوع برشکستگی را محاسبه کرده ایم. هدف ما در این پایان نامه بسیار محاسباتی و کاربردی تر از روش های است که در گذشته برای محاسبه این احتمال ارائه شده است. در ابتدا ما نشا...

15 صفحه اول

Analysis of Thompson Sampling for Stochastic Sleeping Bandits

We study a variant of the stochastic multiarmed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is anO(log T ) regret bound for Thompson Sampling, which generalizes a similar bound known ...

متن کامل

PAC-Bayesian Analysis of Contextual Bandits

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IISE transactions

سال: 2021

ISSN: ['2472-5854', '2472-5862']

DOI: https://doi.org/10.1080/24725854.2021.1882014